Input space reduction for Rule Based Classification
نویسندگان
چکیده
Rule based classification is one of the most popular way of classification in data mining. There are number of algorithms for rule based classification. C4.5 and Partial Decision Tree (PART) are very popular algorithms among them and both have many empirical features such as continuous number categorization, missing value handling, etc. However in many cases these algorithms takes more processing time and provides less accuracy rate for correctly classified instances. One of the main reasons is high dimensionality of the databases. A large dataset might contain hundreds of attributes with huge instances. We need to choose most related attributes among them to obtain higher accuracy. It is also a difficult task to choose a proper algorithm to perform efficient and perfect classification. With our proposed method, we select the most relevant attributes from a dataset by reducing input space and simultaneously improve the performance of these two rule based algorithms. The improved performance is measured based on better accuracy and less computational complexity. We measure Entropy of Information Theory to identify the central attribute for a dataset. Then apply correlation coefficient measure namely, Pearson’s, Spearman and Kendall correlation utilizing the central attribute of the same dataset. We have conducted a comparative study using these three most popular correlation coefficient measures to choose the best method. We have picked datasets from well known data repository UCI (University of California Irvine) database. We have used box plot to compare experimental results. Our proposed method has showed better performance in most of the individual experiment.
منابع مشابه
A Margin-based Model with a Fast Local Searchnewline for Rule Weighting and Reduction in Fuzzynewline Rule-based Classification Systems
Fuzzy Rule-Based Classification Systems (FRBCS) are highly investigated by researchers due to their noise-stability and interpretability. Unfortunately, generating a rule-base which is sufficiently both accurate and interpretable, is a hard process. Rule weighting is one of the approaches to improve the accuracy of a pre-generated rule-base without modifying the original rules. Most of the pro...
متن کاملS3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...
متن کاملA QUADRATIC MARGIN-BASED MODEL FOR WEIGHTING FUZZY CLASSIFICATION RULES INSPIRED BY SUPPORT VECTOR MACHINES
Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only considers both accuracy and generalization criteria in a single objective fu...
متن کاملDetermining Effective Features for Face Detection Using a Hybrid Feature Approach
Detecting faces in cluttered backgrounds and real world has remained as an unsolved problem yet. In this paper, by using composition of some kind of independent features and one of the most common appearance based approaches, and multilayered perceptron (MLP) neural networks, not only some questions have been answered, but also the designed system achieved better performance rather than the pre...
متن کاملFisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection
Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...
متن کاملدستهبندی پرسشها با استفاده از ترکیب دستهبندها
Question answering systems are produced and developed to provide exact answers to the question posted in natural language. One of the most important parts of question answering systems is question classification. The purpose of question classification is predicting the kind of answer needed for the question in natural language. The literature works can be categorized as rule-based and learning...
متن کامل